docs: verify reduction #841 — NAE 3-SAT → Set Splitting#983
Closed
docs: verify reduction #841 — NAE 3-SAT → Set Splitting#983
Conversation
…uctions
New skill: /verify-reduction <issue-number>
End-to-end pipeline that takes a reduction rule issue and produces:
1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples)
2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5)
3. Lean 4 lemmas (non-trivial structural proofs required)
Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR.
Strict quality gates (zero tolerance):
- No "trivial" category — every reduction ≥5000 checks
- 7 mandatory Python sections including NO (infeasible) example
- Non-trivial Lean required (rfl/omega tautologies rejected)
- Zero hand-waving in Typst ("clearly", "obviously" → rejected)
- Mandatory gap analysis: every proof claim must have a test
- Self-review checklist with 20+ items across 4 categories
Developed and validated through PR #975 (800K+ checks, 3 bugs caught)
and tested on issues #868 (caught wrong example) and #841 (35K checks).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…kill
- Added frontmatter (name, description) matching other skills' convention
- Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP",
"NON-NEGOTIABLE") to professional but firm language
- All quality gates unchanged — same strictness, better presentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces Lean-required gates with adversarial second agent: - Step 5: Adversary agent independently implements reduce() and extract_solution() from theorem statement only (not constructor's script) - Step 5c: Cross-comparison of both implementations on 1000 instances - Lean downgraded from required to optional - hypothesis property-based testing for n up to 50 - Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md): - Same agent writing proof + test is the #1 risk for AI verification - Two independent implementations agreeing > one + trivial Lean - Lean caught 0 bugs in PR #975; Python caught all 4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Typst: Construction + Correctness (⟹ + ⟸) + Extraction + Overhead + YES/NO examples Constructor: 41,107 checks, 0 failures (exhaustive n=3, sampled n=4,5, 7 sections) Adversary: 19,832 checks, 0 failures (independent impl + 3 hypothesis strategies) Cross-comparison: 900 instances, all agree, 0 feasibility mismatches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #983 +/- ##
=======================================
Coverage 98.03% 98.03%
=======================================
Files 784 784
Lines 82310 82310
=======================================
Hits 80695 80695
Misses 1615 1615 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Verification Report
Test plan
python3 docs/paper/verify-reductions/verify_nae3sat_setsplitting.py→ 0 failurespython3 docs/paper/verify-reductions/adversary_nae3sat_setsplitting.py→ 0 failurespython3 docs/paper/verify-reductions/cross_compare_nae3sat_setsplitting.py→ 0 disagreementsCloses #841
🤖 Generated with Claude Code